Global Bandits
نویسندگان
چکیده
Standard multi-armed bandits model decision problems in which the consequences of each action choice are unknown and independent of each other. But in a wide variety of decision problems – from drug dosage to dynamic pricing – the consequences (rewards) of different actions are correlated, so that selecting one action provides information about the consequences (rewards) of other actions as well. We propose and analyze a class of models of such decision problems; we call this class of models global bandits. When rewards across actions (arms) are sufficiently correlated we construct a greedy policy that achieves bounded regret, with a bound that depends on the true parameters of the problem. In the special case in which rewards of all arms are deterministic functions of a single unknown parameter, we construct a (more sophisticated) greedy policy that achieves bounded regret, with a bound that depends on the single true parameter of the problem. For this special case we also obtain a bound on regret that is independent of the true parameter; this bound is sub-linear, with an exponent that depends on the informativeness of the arms (which measures the strength of correlation between arm rewards).
منابع مشابه
Asymptotic optimal control of multi-class restless bandits
We study the asymptotic optimal control of multi-class restless bandits. A restless bandit is acontrollable process whose state evolution depends on whether or not the bandit is made active. Theaim is to find a control that determines at each decision epoch which bandits to make active in orderto minimize the overall average cost associated to the states the bandits are in. Sinc...
متن کاملAsymptotically optimal priority policies for indexable and non-indexable restless bandits
We study the asymptotic optimal control of multi-class restless bandits. A restless bandit isa controllable stochastic process whose state evolution depends on whether or not the bandit ismade active. Since finding the optimal control is typically intractable, we propose a class of prioritypolicies that are proved to be asymptotically optimal under a global attractor property an...
متن کاملLinear Contextual Bandits with Global Constraints and Objective
We consider the linear contextual bandit problem with global convex constraints and a concaveobjective function. In each round, the outcome of pulling an arm is a vector, that depends linearly onthe context of that arm. The global constraints require the average of these vectors to lie in a certainconvex set. The objective is a concave function of this average vector. This probl...
متن کاملCatching Bandits and Only Bandits: Privacy-Preserving Intersection Warrants for Lawful Surveillance
Motivated in part by the Snowden revelations, we address the question of whether intelligence and lawenforcement agencies can gather actionable, relevant information about unknown electronic targets without conducting dragnet surveillance. We formulate principles that we believe effective, lawful surveillance protocols should adhere to in an era of big data and global communication networks. We...
متن کاملResourceful Contextual Bandits
We study contextual bandits with ancillary constraints on resources, which are common in realworld applications such as choosing ads or dynamic pricing of items. We design the first algorithm for solving these problems that improves over a trivial reduction to the non-contextual case. We consider very general settings for both contextual bandits (arbitrary policy sets, Dudik et al. (2011)) and ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1503.08370 شماره
صفحات -
تاریخ انتشار 2015